Licensed under the Creative Commons attribution-noncommercial license. Please share and remix noncommercially, mentioning its origin.
This document was produced using pomp version 0.69.3.
This tutorial aims to help you get started using pomp as a suite of tools for analysis of time series data based on dynamical systems models. First, we give some conceptual background regarding the class of models—partially observed Markov processes—that pomp handles. We then discuss some preliminaries: installing the package and so on. Next, using a basic question about ecological population regulation as an example, we load some data and implement some models as R objects of class pomp. Finally, we illustrate some of the package’s capabilities by using its algorithms to fit and compare the models using various inference methods.
As its name implies pomp is focused on partially observed Markov process models. These are also known as state-space model, hidden Markov models, and stochastic dynamical systems models. Such models consist of an unobserved stochastic state process, connected to the data via an explicit model of the observation process. We refer to the former as the process model and the latter as the measurement model. Each model is, in fact, a probability distribution. If we let \(Y_t\) denote the measurement process at time \(t\) and \(X_t\) the state process, then by definition, the process model is determined by the density \(f_{X_t|X_{t-1}}\) and the measurement process by the density \(f_{Y_t|X_t}\). We think of the data, \(y^*_t\), \(t=1,\dots,T\) as being a single realization of the \(Y\) process. To implement a POMP model in pomp, we have to specify the measurement and process distributions.
Note however, that, for each of the process and the measurement model there are two distinct operations that we might wish to perform:
We refer to the simulation of \(f_{X_t|X_{t-1}}\) as the rprocess component of the POMP model, the evaluation of \(f_{X_t|X_{t-1}}\) as the dprocess component, the simulation of \(f_{Y_t|X_t}\), as the rmeasure component, and the evaluation of \(f_{Y_t|X_t}\) as the dmeasure component. Methods that make no use of the dprocess component are called “plug-and-play” methods. At present, pomp is focused on such methods, so there is no reason to focus on the dprocess component any further. In the following, we will illustrate and explain how one specifies the rprocess, rmeasure, and dmeasure components of a model in pomp. We will illustrate this using some simple examples.
The following is a schematic of the structure of a POMP showing causal relations between the process model, the measurement model, and the data. The key perspective to keep in mind is that the model is to be viewed as the process that generated the data.
Here is another POMP model schematic, showing dependence among model variables.
State variables, \(X\), at time \(t\) depend only on state variables at the previous timestep. Measurements, \(Y\), at time \(t\) depend only on the state at that time. Formally, \[\mathrm{Prob}[X_t|X_0,\dots,X_{t-1},Y_1,\dots,Y_{t-1}]=\mathrm{Prob}[X_t|X_{t-1}]\] and \[\mathrm{Prob}[Y_t|X_0,\dots,X_{t},Y_1,\dots,Y_{t-1}]=\mathrm{Prob}[Y_t|X_{t}],\] for all \(t=1,\dots,T\).
To get started, we must install pomp, if it is not already installed. The package can be downloaded from CRAN, but the latest version is always available at the package homepage on R-Forge. Run the following to install this to download and install the latest version.
install.packages("pomp",repos="http://R-Forge.R-Project.org")
In this document, we will implement pomp models using the package’s Csnippet facility. This allows the user to write model components using snippets of C code, which is then compiled and linked into a running R session. This typically affords a manyfold increase in computation time. It is possible to avoid Csnippets entirely by writing model components as R functions, but the resulting implementations are typically too slow to be of practical use.
Note to Windows users.
To use Csnippets, you must be able to compile C codes. Compilers are not typically installed on Windows systems. To achieve full functionality of pomp on a Windows computer, therefore, you must install the Rtools suite. See http://cran.r-project.org/bin/windows/Rtools for downloads and instructions.
Note to Mac users.
Although a variety of compilers are freely available for the Mac platform, they are not typically installed by default.
The Csnippet functionality in essence requires that you be able to use R CMD SHLIB. To see if you can do this, execute the following in an R session.
cat("#include <R.h>
void hello (void) {
Rprintf(\"hello world!\\n\");
}",file="hello.c")
system("R CMD SHLIB hello.c")
dyn.load(paste0("hello",.Platform$dynlib.ext))
.C("hello",PACKAGE="hello")
If you see the “hello world” message printed, your system should support Csnippet.
We will illustrate pomp by performing a limited data analysis on a set of bird abundance data. The data are from annual censuses of a population of Parus major (Great Tit) in Wytham Wood, Oxfordshire. They were retrieved as dataset #10163 from the Global Population Dynamics Database version 2 (NERC Centre for Population Biology, Imperial College, 2010). The original source is McCleery and Perrins (1991).
We load the data by doing
parus.dat <- read.csv(text="
year,pop
1960,148
1961,258
1962,185
1963,170
1964,267
1965,239
1966,196
1967,132
1968,167
1969,186
1970,128
1971,227
1972,174
1973,177
1974,137
1975,172
1976,119
1977,226
1978,166
1979,161
1980,199
1981,306
1982,206
1983,350
1984,214
1985,175
1986,211"
)
Here is a plot of these data:
ggplot(data=parus.dat,mapping=aes(x=year,y=pop))+
geom_line()+geom_point()+
expand_limits(y=0)+
theme_classic()
To keep things simple, let’s attempt to explain these data using a very simple model of stable but stochastic population dynamics, the logistic, or Verhulst-Pearl, equation with environmental stochasticity. We’ll write this model as a stochastic differential equation (SDE), specifically an Ito diffusion: \[dN = r\,N\,\left(1-\frac{N}{K}\right)\,dt+\sigma\,N\,dW(t),\] where \(N\) is the population size, \(r\) is a fixed rate, the so-called “Malthusian parameter”, \(K\) is the population’s “carrying capacity”, \(\sigma\) describes the intensity of extrinsic stochastic forcing on the system, and \(dW\) is an increment of a standard Wiener process. [Those unfamiliar with Wiener processes and Ito diffusions will find it useful to visualize \(dW(t)\), for each time \(t\), as a normal random variable with mean zero and standard deviation \(\sqrt{dt}\).] To implement this model in pomp, we must tell the package how to simulate this model. The easiest way to simulate such an SDE is via the Euler-Maruyama method. In this approximation, we take a large number of very small steps, each of duration \(\Delta t\). At each step, we hold the right-hand side of the above equation constant, compute \(\Delta N\) using that equation, and increment \(N\) accordingly. pomp gives us the euler.sim function to assist us in implementing the Euler-Maruyama method. To use it, we must encode the computations that take a single step. The best way to do this is to write a snippet of code in the C language. The following is such a snippet for the stochastic logistic equation above.
step.fun <- Csnippet("
double dW = rnorm(0,sqrt(dt));
N += r*N*(1-N/K)*dt+sigma*N*dW;
")
This is just a snippet of C code: not all the variables are declared and the context of the snippet is not specified. When given this snippet, pomp will provide the necessary declarations and context, compile the resulting C code, dynamically link to it, and use it to simulate realizations of the process model. We cause all this to happen when we construct an object of class pomp via a call to the constructor function, which is also named pomp:
parus <- pomp(data=parus.dat,time="year",t0=1959,
rprocess=euler.sim(step.fun=step.fun,delta.t=1/365),
statenames="N",paramnames=c("r","K","sigma"))
In the above, we’ve specified an Euler-Maruyama time-step of about one day. The t0 argument specifies that we are going to treat the stochastic state process as being initialized in year 1959. We use the statenames and paramnames arguments to indicate which of the undeclared variables in the Csnippet step.fun are state variables and which are fixed parameters. Since dW is a local variable, we must provide an ordinary C declaration for it. Note that dt is a variable that is defined in the context of this snippet; it is actually provided by pomp and will contain the size of the Euler step. The rnorm function is part of the R API: see the manual on “Writing R Extensions” for a description of this and the other distribution functions provided as part of the R API. Finally, note that the state variable N is over-written by this snippet: it’s value when the first line is executed is overwritten by the second line.
A full set of rules for writing pomp C snippets is given in the package help ?Csnippet.
With the process model in place, we can simulate the process model using the simulate function and plot several realizations for a given set of parameters.
simStates <- simulate(parus,nsim=10,params=c(r=0.2,K=200,sigma=0.5,N.0=200),states=TRUE)
Although it is the process model that is usually the focus of an investigator’s interest, the measurement model is equally important because it connects the process model to the data. For this example, let’s model the observations using a Poisson distribution: \[\mathrm{pop}_t \sim \mathrm{Poisson}(\phi\,N),\] where \(\mathrm{pop}_t\) is our census population size at time \(t\), \(N\) is the true population size, and \(\phi\) is a scaling parameter.
The following snippet encodes the rmeasure component for this model.
rmeas <- Csnippet("
pop = rpois(phi*N);
")
In this snippet of code, N is the state variable, phi is another parameter, and pop is the name of our observable, as defined in the dataset parus.dat. The rpois function is part of the R API: it takes a single argument—the Poisson distribution’s parameter—and returns a pseudo-random draw from the Poisson distribution with that parameter. As with the step.fun snippet above, execution of this snippet causes pop to be overwritten.
To fold this into our pomp object, we do
parus <- pomp(parus,rmeasure=rmeas,paramnames="phi",statenames="N")
Note that, again, we must tell pomp which of the variables is a parameter and which is a state variable. We can now simulate from the full POMP model and plot the results.
sim <- simulate(parus,params=c(r=0.2,K=200,phi=0.5,sigma=0.5,N.0=200),
nsim=10,obs=TRUE,states=TRUE)
The following snippet encodes the dmeasure component.
dmeas <- Csnippet("
lik = dpois(pop,phi*N,give_log);
")
and we fold it into the pomp object via
parus <- pomp(parus,dmeasure=dmeas,paramnames="phi",statenames="N")
In the dmeas snippet, dpois again comes from the R API. It takes three arguments, the datum, the Poisson parameter, and give_log. When give_log=0, dpois returns the Poisson likelihood; when give_log=1, dpois returns the log of this likelihood. When this snippet is executed, pomp will provide the value of give_log according to its needs. It is the user’s responsibility to make sure that the correct value is returned.
Since we now have both the rprocess and the dmeasure components in place, we can perform a particle filtering operation to estimate the log likelihood at any given point in parameter space. For example, we can do
pf <- pfilter(parus,Np=1000,params=c(r=0.2,K=200,phi=0.5,sigma=0.5,N.0=200))
logLik(pf)
## [1] -156.1319
A deterministic skeleton of the logistic model is obtained by taking \(\sigma\to 0\), which leads to the deterministic differential equation \[\frac{dN}{dt} = r\,N\,\left(1-\frac{N}{K}\right).\] The following snippet encodes this deterministic skeleton and folds it into the pomp object.
skel <- Csnippet("
DN = r*N*(1-N/K);
")
parus <- pomp(parus,skeleton=skel,skeleton.type="vectorfield",statenames="N",paramnames=c("r","K"))
Note that in this snippet, DN is filled with the value of the time-derivative of N. See the package help (?Csnippet) for a complete set of rules for writing C snippets.
With the deterministic skeleton in place we can generate several trajectories of the skeleton using trajectory, as follows, and plot the result.
pars <- parmat(c(r=1,K=200,phi=0.5,sigma=0.5,N.0=20),5)
pars["N.0",] <- seq(20,300,length=5)
traj <- trajectory(parus,params=pars,times=seq(1959,1970,by=0.01))
Let us demonstrate the implementation of discrete-time process models by replacing the continuous-time logistic equation used above with the stochastic Beverton-Holt model \[N_{t+1}=\frac{a\,N_t}{1+b\,N_t}\,\varepsilon_t,\] where \(a\) and \(b\) are parameters and \(\varepsilon_t\sim\mathrm{Lognormal}(-\tfrac{1}{2}\sigma^2,\sigma)\).
Since we are now dealing with a discrete-time model, euler.sim will no longer construct an appropriate rprocess component. Instead, we can use discrete.time.sim, which also takes a C snippet encoding the one-step simulation. The following snippet simulates a single iteration of the stochastic Beverton-Holt map
bh.step <- Csnippet("
double eps = rlnorm(-sigma*sigma/2,sigma);
N = a*N/(1+b*N)*eps;
")
A corresponding skeleton is the deterministic Beverton-Holt map obtained by setting \(e_t=1\) in the above equation. A snippet that implements this map is
bh.skel <- Csnippet("
DN = a*N/(1+b*N);
")
Note that, as in the continuous case, we indicate the new value of the state variable by prepending D to the variable name.
We create a new pomp object based on this process model but with the same rmeasure and dmeasure components as in parus by executing
parus.bh <- pomp(parus,rprocess=discrete.time.sim(bh.step,delta.t=1),skeleton=bh.skel,skeleton.type="map",skelmap.delta.t=1,statenames="N",paramnames=c("a","b","sigma"))
The following codes test the above by computing some simulations, a trajectory of the skeleton, and running a particle filtering operation.
coef(parus.bh) <- c(a=1.1,b=5e-4,sigma=0.5,N.0=30,phi=1)
sim <- simulate(parus.bh)
traj <- trajectory(parus.bh)
pf <- pfilter(parus.bh,Np=1000)
We can specify model-specific parameter transformations using C snippets. The following implements log transformation of the \(r\) and \(K\) parameters of the logistic model and folds them into the pomp object parus.
logtrans <- Csnippet("
Tr = log(r);
TK = log(K);
")
exptrans <- Csnippet("
Tr = exp(r);
TK = exp(K);
")
parus <- pomp(parus,toEstimationScale=logtrans,
fromEstimationScale=exptrans,
paramnames=c("r","K"))
Trajectory matching is the method of fitting a deterministic model to data assuming independent errors. In pomp, the function traj.match searches parameter space to find parameters under which the likelihood of the data given a trajectory of the deterministic skeleton is maximized.
tm <- traj.match(parus,start=c(r=1,K=200,phi=1,N.0=200,sigma=0.5),
est=c("r","K","phi"),transform=TRUE)
signif(coef(tm),3)
## r K phi N.0 sigma
## 3.6100 2340.0000 0.0845 200.0000 0.5000
logLik(tm)
## [1] -270.3827
We can simulate the fitted model and compare it against the data.
coef(tm,"sigma") <- 0
sim1 <- simulate(tm,nsim=10,as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim1,mapping=aes(x=time,y=pop,group=sim,alpha=(sim=="data")))+
scale_alpha_manual(name="",values=c(`TRUE`=1,`FALSE`=0.2),
labels=c(`FALSE`="simulation",`TRUE`="data"))+
geom_line()
Iterated filtering (Ionides et al. 2006, 2015) is a method for maximizing the likelihood by repeatedly applying a particle filter. The following codes apply the IF2 algorithm (Ionides et al. 2015).
mf <- mif2(parus,Nmif=50,Np=1000,cooling.fraction=0.8,
rw.sd=rw.sd(r=0.02,K=0.02,phi=0.02,sigma=0.02),transform=TRUE)
mf <- mif2(mf)
mle <- coef(mf); mle
## r K phi N.0 sigma
## 3.0798228 206.2653610 1.0523361 200.0000000 0.6779096
logmeanexp(replicate(5,logLik(pfilter(mf))),se=TRUE)
## se
## -143.9365865 0.5221863
sim2 <- simulate(mf,nsim=10,as.data.frame=TRUE,include.data=TRUE)
ggplot(data=sim2,mapping=aes(x=time,y=pop,group=sim,alpha=(sim=="data")))+
scale_alpha_manual(name="",values=c(`TRUE`=1,`FALSE`=0.2),
labels=c(`FALSE`="simulation",`TRUE`="data"))+
geom_line()
The first command runs 50 iterations of the algorithm; the second re-runs the algorithm, starting from where the first run ended up. The next line extracts and displays the MLE. The fourth command runs 5 replicate particle filters to compute the log likelihood at the estimated parameters and averages these appropriately to get an estimate of this likelihood and of the standard Monte Carlo error. Finally, the above plots the data and 10 simulated realizations of the model process on the same axes.
The following codes cause 5 parallel pMCMC chains to be run, beginning at the MLE obtained above.
dprior <- Csnippet("
lik = dunif(r,0,5,1)+dunif(K,100,800,1)+dunif(phi,0,2,1)+
dunif(sigma,0,2,1);
lik = (give_log) ? lik : exp(lik);
")
parus <- pomp(parus,dprior=dprior,paramnames=c("r","K","phi","sigma"))
pchs <- foreach (i=1:5,.combine=c,
.options.multicore=mcopts) %dopar% {
pmcmc(parus,Nmcmc=1000,Np=100,start=mle,
proposal=mvn.diag.rw(c(r=0.02,K=0.02,phi=0.02,sigma=0.02)))
}
traces <- conv.rec(pchs,c("r","K","phi"))
require(coda)
plot(traces[,"r"])
plot(traces[,"K"])
plot(traces[,"phi"])
gelman.plot(traces)
gelman.diag(traces)
## Potential scale reduction factors:
##
## Point est. Upper C.I.
## r 2.32 4.56
## K 4.23 7.05
## phi 1.27 1.64
##
## Multivariate psrf
##
## 3.84
Ionides, E. L., C. Bretó, and A. A. King. 2006. Inference for nonlinear dynamical systems. Proceedings of the National Academy of Sciences of the U.S.A. 103:18438–18443.
Ionides, E. L., D. Nguyen, Y. Atchadé, S. Stoev, and A. A. King. 2015. Inference for dynamic and latent variable models via iterated, perturbed Bayes maps. Proceedings of the National Academy of Sciences of the U.S.A. 112:719–724.
McCleery, R. H., and C. M. Perrins. 1991. Effects of predation on the numbers of great tits Parus major. Pages 129–147 in Bird population studies. relevence to conservation and management. Oxford University Press, Oxford.